Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells4734013
Missing cells (%)69.9%
Duplicate rows309470
Duplicate rows (%)91.4%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Categorical8
Numeric12

Warnings

Dataset has 309470 (91.4%) duplicate rows Duplicates
PayWideKumi6 is highly correlated with PayWidePay6 and 15 other fieldsHigh correlation
PayWidePay6 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PayWideNinki6 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PayWideKumi7 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PayWidePay7 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PayWideNinki7 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PayUmatanKumi1 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PayUmatanPay1 is highly correlated with PayWideKumi6 and 8 other fieldsHigh correlation
PayUmatanNinki1 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PayUmatanPay2 is highly correlated with PayUmatanPay1High correlation
PaySanrenpukuKumi1 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PaySanrenpukuPay1 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PaySanrenpukuNinki1 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PaySanrenpukuKumi2 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PaySanrenpukuPay2 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PaySanrenpukuNinki2 is highly correlated with PayWideKumi6 and 7 other fieldsHigh correlation
PaySanrenpukuKumi3 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PaySanrenpukuPay3 is highly correlated with PayWideKumi6 and 15 other fieldsHigh correlation
PaySanrenpukuPay3 is highly correlated with PayWideNinki6 and 6 other fieldsHigh correlation
PayWideNinki6 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PayWideNinki7 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PayWideKumi6 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PaySanrenpukuKumi3 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PayWidePay7 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PayWideKumi7 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PayWidePay6 is highly correlated with PaySanrenpukuPay3 and 6 other fieldsHigh correlation
PayWideKumi6 has 338561 (> 99.9%) missing values Missing
PayWidePay6 has 338561 (> 99.9%) missing values Missing
PayWideNinki6 has 338561 (> 99.9%) missing values Missing
PayWideKumi7 has 338561 (> 99.9%) missing values Missing
PayWidePay7 has 338561 (> 99.9%) missing values Missing
PayWideNinki7 has 338561 (> 99.9%) missing values Missing
PayUmatanKumi2 has 337507 (99.7%) missing values Missing
PayUmatanPay2 has 337507 (99.7%) missing values Missing
PayUmatanNinki2 has 337507 (99.7%) missing values Missing
PaySanrenpukuKumi2 has 337668 (99.7%) missing values Missing
PaySanrenpukuPay2 has 337668 (99.7%) missing values Missing
PaySanrenpukuNinki2 has 337668 (99.7%) missing values Missing
PaySanrenpukuKumi3 has 338561 (> 99.9%) missing values Missing
PaySanrenpukuPay3 has 338561 (> 99.9%) missing values Missing

Reproduction

Analysis started2021-04-07 13:00:07.844126
Analysis finished2021-04-07 13:00:55.969382
Duration48.13 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

PayWideKumi6
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
418.0
18 
508.0
13 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters155
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row508.0
2nd row508.0
3rd row508.0
4th row508.0
5th row508.0
ValueCountFrequency (%)
418.018
 
< 0.1%
508.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
418.018
58.1%
508.013
41.9%

Most occurring characters

ValueCountFrequency (%)
044
28.4%
831
20.0%
.31
20.0%
418
11.6%
118
11.6%
513
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number124
80.0%
Other Punctuation31
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
044
35.5%
831
25.0%
418
14.5%
118
14.5%
513
 
10.5%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common155
100.0%

Most frequent character per script

ValueCountFrequency (%)
044
28.4%
831
20.0%
.31
20.0%
418
11.6%
118
11.6%
513
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII155
100.0%

Most frequent character per block

ValueCountFrequency (%)
044
28.4%
831
20.0%
.31
20.0%
418
11.6%
118
11.6%
513
 
8.4%

PayWidePay6
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
580.0
18 
370.0
13 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters155
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row370.0
2nd row370.0
3rd row370.0
4th row370.0
5th row370.0
ValueCountFrequency (%)
580.018
 
< 0.1%
370.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
580.018
58.1%
370.013
41.9%

Most occurring characters

ValueCountFrequency (%)
062
40.0%
.31
20.0%
518
 
11.6%
818
 
11.6%
313
 
8.4%
713
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number124
80.0%
Other Punctuation31
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
062
50.0%
518
 
14.5%
818
 
14.5%
313
 
10.5%
713
 
10.5%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common155
100.0%

Most frequent character per script

ValueCountFrequency (%)
062
40.0%
.31
20.0%
518
 
11.6%
818
 
11.6%
313
 
8.4%
713
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII155
100.0%

Most frequent character per block

ValueCountFrequency (%)
062
40.0%
.31
20.0%
518
 
11.6%
818
 
11.6%
313
 
8.4%
713
 
8.4%

PayWideNinki6
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
19.0
18 
9.0
13 

Length

Max length4
Median length4
Mean length3.580645161
Min length3

Characters and Unicode

Total characters111
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9.0
2nd row9.0
3rd row9.0
4th row9.0
5th row9.0
ValueCountFrequency (%)
19.018
 
< 0.1%
9.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
19.018
58.1%
9.013
41.9%

Most occurring characters

ValueCountFrequency (%)
931
27.9%
.31
27.9%
031
27.9%
118
16.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number80
72.1%
Other Punctuation31
 
27.9%

Most frequent character per category

ValueCountFrequency (%)
931
38.8%
031
38.8%
118
22.5%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common111
100.0%

Most frequent character per script

ValueCountFrequency (%)
931
27.9%
.31
27.9%
031
27.9%
118
16.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII111
100.0%

Most frequent character per block

ValueCountFrequency (%)
931
27.9%
.31
27.9%
031
27.9%
118
16.2%

PayWideKumi7
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
1118.0
18 
514.0
13 

Length

Max length6
Median length6
Mean length5.580645161
Min length5

Characters and Unicode

Total characters173
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row514.0
2nd row514.0
3rd row514.0
4th row514.0
5th row514.0
ValueCountFrequency (%)
1118.018
 
< 0.1%
514.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
1118.018
58.1%
514.013
41.9%

Most occurring characters

ValueCountFrequency (%)
167
38.7%
.31
17.9%
031
17.9%
818
 
10.4%
513
 
7.5%
413
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number142
82.1%
Other Punctuation31
 
17.9%

Most frequent character per category

ValueCountFrequency (%)
167
47.2%
031
21.8%
818
 
12.7%
513
 
9.2%
413
 
9.2%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common173
100.0%

Most frequent character per script

ValueCountFrequency (%)
167
38.7%
.31
17.9%
031
17.9%
818
 
10.4%
513
 
7.5%
413
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII173
100.0%

Most frequent character per block

ValueCountFrequency (%)
167
38.7%
.31
17.9%
031
17.9%
818
 
10.4%
513
 
7.5%
413
 
7.5%

PayWidePay7
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
370.0
18 
150.0
13 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters155
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row150.0
2nd row150.0
3rd row150.0
4th row150.0
5th row150.0
ValueCountFrequency (%)
370.018
 
< 0.1%
150.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
370.018
58.1%
150.013
41.9%

Most occurring characters

ValueCountFrequency (%)
062
40.0%
.31
20.0%
318
 
11.6%
718
 
11.6%
113
 
8.4%
513
 
8.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number124
80.0%
Other Punctuation31
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
062
50.0%
318
 
14.5%
718
 
14.5%
113
 
10.5%
513
 
10.5%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common155
100.0%

Most frequent character per script

ValueCountFrequency (%)
062
40.0%
.31
20.0%
318
 
11.6%
718
 
11.6%
113
 
8.4%
513
 
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII155
100.0%

Most frequent character per block

ValueCountFrequency (%)
062
40.0%
.31
20.0%
318
 
11.6%
718
 
11.6%
113
 
8.4%
513
 
8.4%

PayWideNinki7
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
9.0
18 
1.0
13 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters93
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
9.018
 
< 0.1%
1.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
9.018
58.1%
1.013
41.9%

Most occurring characters

ValueCountFrequency (%)
.31
33.3%
031
33.3%
918
19.4%
113
14.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number62
66.7%
Other Punctuation31
33.3%

Most frequent character per category

ValueCountFrequency (%)
031
50.0%
918
29.0%
113
21.0%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common93
100.0%

Most frequent character per script

ValueCountFrequency (%)
.31
33.3%
031
33.3%
918
19.4%
113
14.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII93
100.0%

Most frequent character per block

ValueCountFrequency (%)
.31
33.3%
031
33.3%
918
19.4%
113
14.0%

PayUmatanKumi1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct306
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean800.3921475
Minimum102
Maximum1817
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum102
5-th percentile111
Q1409
median804
Q31201
95-th percentile1516
Maximum1817
Range1715
Interquartile range (IQR)792

Descriptive statistics

Standard deviation448.9344477
Coefficient of variation (CV)0.5608931186
Kurtosis-1.025963131
Mean800.3921475
Median Absolute Deviation (MAD)396
Skewness0.1841746278
Sum271006378
Variance201542.1383
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2032244
 
0.7%
2042197
 
0.6%
1082163
 
0.6%
4062155
 
0.6%
3022137
 
0.6%
4052124
 
0.6%
8092114
 
0.6%
7052113
 
0.6%
4032103
 
0.6%
5042096
 
0.6%
Other values (296)317146
93.7%
ValueCountFrequency (%)
1022093
0.6%
1031967
0.6%
1042068
0.6%
1051995
0.6%
1061640
0.5%
ValueCountFrequency (%)
1817162
< 0.1%
1816128
< 0.1%
1815205
0.1%
1814109
< 0.1%
1813161
< 0.1%

PayUmatanPay1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct4722
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12461.74216
Minimum170
Maximum1087490
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum170
5-th percentile680
Q11720
median3830
Q310130
95-th percentile48420
Maximum1087490
Range1087320
Interquartile range (IQR)8410

Descriptive statistics

Standard deviation33837.93143
Coefficient of variation (CV)2.715345175
Kurtosis198.2246047
Mean12461.74216
Median Absolute Deviation (MAD)2660
Skewness10.86805933
Sum4219446200
Variance1145005604
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
780879
 
0.3%
980866
 
0.3%
1230860
 
0.3%
900843
 
0.2%
1340822
 
0.2%
1730805
 
0.2%
1410801
 
0.2%
1050800
 
0.2%
1170797
 
0.2%
1160796
 
0.2%
Other values (4712)330323
97.6%
ValueCountFrequency (%)
1708
 
< 0.1%
1806
 
< 0.1%
19035
< 0.1%
20045
< 0.1%
21018
 
< 0.1%
ValueCountFrequency (%)
108749012
< 0.1%
102557017
< 0.1%
96258010
< 0.1%
9062206
 
< 0.1%
88491010
< 0.1%

PayUmatanNinki1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct239
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.56213673
Minimum1
Maximum303
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median13
Q335
95-th percentile99
Maximum303
Range302
Interquartile range (IQR)31

Descriptive statistics

Standard deviation34.28358064
Coefficient of variation (CV)1.290693628
Kurtosis6.955538318
Mean26.56213673
Median Absolute Deviation (MAD)11
Skewness2.380713397
Sum8993727
Variance1175.363901
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
129404
 
8.7%
224067
 
7.1%
318109
 
5.3%
415519
 
4.6%
513378
 
4.0%
611629
 
3.4%
710973
 
3.2%
89554
 
2.8%
99399
 
2.8%
108166
 
2.4%
Other values (229)188394
55.6%
ValueCountFrequency (%)
129404
8.7%
224067
7.1%
318109
5.3%
415519
4.6%
513378
4.0%
ValueCountFrequency (%)
30317
< 0.1%
27817
< 0.1%
27116
< 0.1%
26310
< 0.1%
25713
< 0.1%

PayUmatanKumi2
Real number (ℝ≥0)

MISSING

Distinct72
Distinct (%)6.6%
Missing337507
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean842.5152074
Minimum104
Maximum1817
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum104
5-th percentile211
Q1511
median807
Q31108
95-th percentile1513
Maximum1817
Range1713
Interquartile range (IQR)597

Descriptive statistics

Standard deviation401.3186234
Coefficient of variation (CV)0.4763339817
Kurtosis-0.5729754805
Mean842.5152074
Median Absolute Deviation (MAD)301
Skewness0.2515246426
Sum914129
Variance161056.6375
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
110849
 
< 0.1%
121029
 
< 0.1%
81128
 
< 0.1%
30227
 
< 0.1%
71027
 
< 0.1%
41126
 
< 0.1%
110626
 
< 0.1%
31425
 
< 0.1%
110725
 
< 0.1%
80424
 
< 0.1%
Other values (62)799
 
0.2%
(Missing)337507
99.7%
ValueCountFrequency (%)
1048
< 0.1%
11314
< 0.1%
2068
< 0.1%
20812
< 0.1%
21117
< 0.1%
ValueCountFrequency (%)
181717
< 0.1%
171316
< 0.1%
161313
< 0.1%
151312
< 0.1%
150215
< 0.1%

PayUmatanPay2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct84
Distinct (%)7.7%
Missing337507
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean7203.990783
Minimum250
Maximum179670
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum250
5-th percentile490
Q1990
median2150
Q36180
95-th percentile20270
Maximum179670
Range179420
Interquartile range (IQR)5190

Descriptive statistics

Standard deviation21530.96507
Coefficient of variation (CV)2.988755222
Kurtosis54.36089448
Mean7203.990783
Median Absolute Deviation (MAD)1390
Skewness7.179901283
Sum7816330
Variance463582456.8
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236029
 
< 0.1%
67029
 
< 0.1%
672024
 
< 0.1%
71021
 
< 0.1%
89020
 
< 0.1%
99019
 
< 0.1%
186017
 
< 0.1%
202017
 
< 0.1%
86016
 
< 0.1%
169016
 
< 0.1%
Other values (74)877
 
0.3%
(Missing)337507
99.7%
ValueCountFrequency (%)
2506
< 0.1%
2605
 
< 0.1%
2808
< 0.1%
39014
< 0.1%
41014
< 0.1%
ValueCountFrequency (%)
17967015
< 0.1%
3554015
< 0.1%
3487014
< 0.1%
213908
< 0.1%
2027016
< 0.1%

PayUmatanNinki2
Real number (ℝ≥0)

MISSING

Distinct44
Distinct (%)4.1%
Missing337507
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean28.59723502
Minimum1
Maximum164
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q16
median14
Q336
95-th percentile106
Maximum164
Range163
Interquartile range (IQR)30

Descriptive statistics

Standard deviation35.60900048
Coefficient of variation (CV)1.245190329
Kurtosis4.981792879
Mean28.59723502
Median Absolute Deviation (MAD)11
Skewness2.217990437
Sum31028
Variance1268.000915
MonotocityNot monotonic
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
1089
 
< 0.1%
474
 
< 0.1%
162
 
< 0.1%
762
 
< 0.1%
2158
 
< 0.1%
1450
 
< 0.1%
245
 
< 0.1%
340
 
< 0.1%
940
 
< 0.1%
3037
 
< 0.1%
Other values (34)528
 
0.2%
(Missing)337507
99.7%
ValueCountFrequency (%)
162
< 0.1%
245
< 0.1%
340
< 0.1%
474
< 0.1%
531
< 0.1%
ValueCountFrequency (%)
16415
< 0.1%
16015
< 0.1%
14716
< 0.1%
10610
< 0.1%
9815
< 0.1%

PaySanrenpukuKumi1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct804
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41120.02132
Minimum10203
Maximum161718
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum10203
5-th percentile10308
Q120407
median31013
Q360712
95-th percentile101116
Maximum161718
Range151515
Interquartile range (IQR)40305

Descriptive statistics

Standard deviation28579.28255
Coefficient of variation (CV)0.6950211025
Kurtosis0.4827592712
Mean41120.02132
Median Absolute Deviation (MAD)19801
Skewness1.016183361
Sum1.392291026 × 1010
Variance816775391.1
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102041447
 
0.4%
405061431
 
0.4%
102031407
 
0.4%
204051398
 
0.4%
103051390
 
0.4%
203061317
 
0.4%
103041299
 
0.4%
304051232
 
0.4%
204061226
 
0.4%
203041226
 
0.4%
Other values (794)325219
96.1%
ValueCountFrequency (%)
102031407
0.4%
102041447
0.4%
102051193
0.4%
102061192
0.4%
102071048
0.3%
ValueCountFrequency (%)
16171834
 
< 0.1%
15171858
< 0.1%
151618124
< 0.1%
15161740
 
< 0.1%
141718101
< 0.1%

PaySanrenpukuPay1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6440
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25452.03599
Minimum130
Maximum5508830
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum130
5-th percentile780
Q12270
median5860
Q317700
95-th percentile104190
Maximum5508830
Range5508700
Interquartile range (IQR)15430

Descriptive statistics

Standard deviation88463.62696
Coefficient of variation (CV)3.475699429
Kurtosis742.8936376
Mean25452.03599
Median Absolute Deviation (MAD)4470
Skewness18.97387075
Sum8617855770
Variance7825813294
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1310633
 
0.2%
1100626
 
0.2%
1510614
 
0.2%
1330604
 
0.2%
1600598
 
0.2%
1240595
 
0.2%
1210595
 
0.2%
920593
 
0.2%
1160587
 
0.2%
790587
 
0.2%
Other values (6430)332560
98.2%
ValueCountFrequency (%)
1305
 
< 0.1%
14010
 
< 0.1%
15015
< 0.1%
16010
 
< 0.1%
17029
< 0.1%
ValueCountFrequency (%)
550883012
< 0.1%
334539011
< 0.1%
270972017
< 0.1%
270479010
< 0.1%
223218010
< 0.1%

PaySanrenpukuNinki1
Real number (ℝ≥0)

HIGH CORRELATION

Distinct488
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.66631521
Minimum1
Maximum814
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q15
median18
Q354
95-th percentile191
Maximum814
Range813
Interquartile range (IQR)49

Descriptive statistics

Standard deviation71.19593258
Coefficient of variation (CV)1.559047018
Kurtosis13.56832045
Mean45.66631521
Median Absolute Deviation (MAD)15
Skewness3.17141922
Sum15462249
Variance5068.860815
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
127655
 
8.2%
220055
 
5.9%
315825
 
4.7%
413045
 
3.9%
511829
 
3.5%
69613
 
2.8%
79263
 
2.7%
97989
 
2.4%
87722
 
2.3%
106698
 
2.0%
Other values (478)208898
61.7%
ValueCountFrequency (%)
127655
8.2%
220055
5.9%
315825
4.7%
413045
3.9%
511829
3.5%
ValueCountFrequency (%)
81417
< 0.1%
74910
< 0.1%
72716
< 0.1%
66414
< 0.1%
65614
< 0.1%

PaySanrenpukuKumi2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct75
Distinct (%)8.1%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean49044.23485
Minimum10305
Maximum131415
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum10305
5-th percentile10608
Q120812
median41215
Q371012
95-th percentile101516
Maximum131415
Range121110
Interquartile range (IQR)50200

Descriptive statistics

Standard deviation31652.34337
Coefficient of variation (CV)0.64538357
Kurtosis-0.3964455582
Mean49044.23485
Median Absolute Deviation (MAD)20802
Skewness0.6585701435
Sum45316873
Variance1001870841
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4081830
 
< 0.1%
2081020
 
< 0.1%
13141518
 
< 0.1%
10141818
 
< 0.1%
7081617
 
< 0.1%
4141817
 
< 0.1%
1061616
 
< 0.1%
5141616
 
< 0.1%
10131516
 
< 0.1%
1111416
 
< 0.1%
Other values (65)740
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
103056
 
< 0.1%
1030710
< 0.1%
1050715
< 0.1%
1060713
< 0.1%
1060811
< 0.1%
ValueCountFrequency (%)
13141518
< 0.1%
12141513
< 0.1%
11121612
< 0.1%
10151611
< 0.1%
10141818
< 0.1%

PaySanrenpukuPay2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct76
Distinct (%)8.2%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean18196.01732
Minimum180
Maximum316360
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum180
5-th percentile520
Q12020
median5490
Q316360
95-th percentile70890
Maximum316360
Range316180
Interquartile range (IQR)14340

Descriptive statistics

Standard deviation37351.78351
Coefficient of variation (CV)2.052744997
Kurtosis34.20730123
Mean18196.01732
Median Absolute Deviation (MAD)4430
Skewness5.110845654
Sum16813120
Variance1395155732
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201021
 
< 0.1%
1366018
 
< 0.1%
2874018
 
< 0.1%
202018
 
< 0.1%
324017
 
< 0.1%
103017
 
< 0.1%
2093016
 
< 0.1%
12200016
 
< 0.1%
32016
 
< 0.1%
1024016
 
< 0.1%
Other values (66)751
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
1804
 
< 0.1%
2607
< 0.1%
32016
< 0.1%
41011
< 0.1%
4807
< 0.1%
ValueCountFrequency (%)
3163608
< 0.1%
12200016
< 0.1%
10956015
< 0.1%
7089012
< 0.1%
6360015
< 0.1%

PaySanrenpukuNinki2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct53
Distinct (%)5.7%
Missing337668
Missing (%)99.7%
Infinite0
Infinite (%)0.0%
Mean71.3030303
Minimum1
Maximum496
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile2
Q113
median30
Q3100
95-th percentile224
Maximum496
Range495
Interquartile range (IQR)87

Descriptive statistics

Standard deviation89.46776023
Coefficient of variation (CV)1.254753968
Kurtosis7.357958273
Mean71.3030303
Median Absolute Deviation (MAD)26
Skewness2.380019253
Sum65884
Variance8004.480121
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1345
 
< 0.1%
138
 
< 0.1%
2138
 
< 0.1%
435
 
< 0.1%
6830
 
< 0.1%
8530
 
< 0.1%
529
 
< 0.1%
227
 
< 0.1%
1927
 
< 0.1%
1127
 
< 0.1%
Other values (43)598
 
0.2%
(Missing)337668
99.7%
ValueCountFrequency (%)
138
< 0.1%
227
< 0.1%
37
 
< 0.1%
435
< 0.1%
529
< 0.1%
ValueCountFrequency (%)
49616
< 0.1%
3358
< 0.1%
26615
< 0.1%
22415
< 0.1%
19612
< 0.1%

PaySanrenpukuKumi3
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
81118.0
18 
50614.0
13 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters217
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50614.0
2nd row50614.0
3rd row50614.0
4th row50614.0
5th row50614.0
ValueCountFrequency (%)
81118.018
 
< 0.1%
50614.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
81118.018
58.1%
50614.013
41.9%

Most occurring characters

ValueCountFrequency (%)
167
30.9%
044
20.3%
836
16.6%
.31
14.3%
513
 
6.0%
613
 
6.0%
413
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number186
85.7%
Other Punctuation31
 
14.3%

Most frequent character per category

ValueCountFrequency (%)
167
36.0%
044
23.7%
836
19.4%
513
 
7.0%
613
 
7.0%
413
 
7.0%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common217
100.0%

Most frequent character per script

ValueCountFrequency (%)
167
30.9%
044
20.3%
836
16.6%
.31
14.3%
513
 
6.0%
613
 
6.0%
413
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII217
100.0%

Most frequent character per block

ValueCountFrequency (%)
167
30.9%
044
20.3%
836
16.6%
.31
14.3%
513
 
6.0%
613
 
6.0%
413
 
6.0%

PaySanrenpukuPay3
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)6.5%
Missing338561
Missing (%)> 99.9%
Memory size2.6 MiB
1140.0
18 
590.0
13 

Length

Max length6
Median length6
Mean length5.580645161
Min length5

Characters and Unicode

Total characters173
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row590.0
2nd row590.0
3rd row590.0
4th row590.0
5th row590.0
ValueCountFrequency (%)
1140.018
 
< 0.1%
590.013
 
< 0.1%
(Missing)338561
> 99.9%
Histogram of lengths of the category
ValueCountFrequency (%)
1140.018
58.1%
590.013
41.9%

Most occurring characters

ValueCountFrequency (%)
062
35.8%
136
20.8%
.31
17.9%
418
 
10.4%
513
 
7.5%
913
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number142
82.1%
Other Punctuation31
 
17.9%

Most frequent character per category

ValueCountFrequency (%)
062
43.7%
136
25.4%
418
 
12.7%
513
 
9.2%
913
 
9.2%
ValueCountFrequency (%)
.31
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common173
100.0%

Most frequent character per script

ValueCountFrequency (%)
062
35.8%
136
20.8%
.31
17.9%
418
 
10.4%
513
 
7.5%
913
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII173
100.0%

Most frequent character per block

ValueCountFrequency (%)
062
35.8%
136
20.8%
.31
17.9%
418
 
10.4%
513
 
7.5%
913
 
7.5%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PayWideKumi6PayWidePay6PayWideNinki6PayWideKumi7PayWidePay7PayWideNinki7PayUmatanKumi1PayUmatanPay1PayUmatanNinki1PayUmatanKumi2PayUmatanPay2PayUmatanNinki2PaySanrenpukuKumi1PaySanrenpukuPay1PaySanrenpukuNinki1PaySanrenpukuKumi2PaySanrenpukuPay2PaySanrenpukuNinki2PaySanrenpukuKumi3PaySanrenpukuPay3
0nannannannannannan161532809nannannan5151623202nannannannannan
1nannannannannannan80913102nannannan108099902nannannannannan
2nannannannannannan4087001nannannan4050810202nannannannannan
3nannannannannannan151640500125nannannan6151696460271nannannannannan
4nannannannannannan161532809nannannan5151623202nannannannannan
5nannannannannannan9121128037nannannan91216650017nannannannannan
6nannannannannannan1081629054nannannan1081264580134nannannannannan
7nannannannannannan16066101nannannan607161102033nannannannannan
8nannannannannannan141664500116nannannan121416440070382nannannannannan
9nannannannannannan14089001nannannan8111422102nannannannannan

Last rows

PayWideKumi6PayWidePay6PayWideNinki6PayWideKumi7PayWidePay7PayWideNinki7PayUmatanKumi1PayUmatanPay1PayUmatanNinki1PayUmatanKumi2PayUmatanPay2PayUmatanNinki2PaySanrenpukuKumi1PaySanrenpukuPay1PaySanrenpukuNinki1PaySanrenpukuKumi2PaySanrenpukuPay2PaySanrenpukuNinki2PaySanrenpukuKumi3PaySanrenpukuPay3
338582nannannannannannan2053501nannannan2050827806nannannannannan
338583nannannannannannan2053501nannannan2050827806nannannannannan
338584nannannannannannan2131400045nannannan10213902032nannannannannan
338585nannannannannannan81562650125nannannan6081544910122nannannannannan
338586nannannannannannan40223105nannannan2041233206nannannannannan
338587nannannannannannan100817707nannannan108101185037nannannannannan
338588nannannannannannan1006355014nannannan2061017103nannannannannan
338589nannannannannannan71412001nannannan6071416103nannannannannan
338590nannannannannannan5091186034nannannan4050964350135nannannannannan
338591nannannannannannan100918904nannannan9101456740129nannannannannan